Ensemble Estimation of Distributional Functionals via $k$-Nearest Neighbors

نویسندگان

  • Kevin R. Moon
  • Kumar Sricharan
  • Alfred O. Hero
چکیده

The problem of accurate nonparametric estimation of distributional functionals (integral functionals of one or more probability distributions) has received recent interest due to their wide applicability in signal processing, information theory, machine learning, and statistics. In particular, k-nearest neighbor (nn) based methods have received a lot of attention due to their adaptive nature and their relatively low computational complexity. We derive the mean squared error (MSE) convergence rates of leave-one-out k-nn plug-in density estimators of a large class of distributional functionals without boundary correction. We then apply the theory of optimally weighted ensemble estimation to obtain weighted ensemble estimators that achieve the parametric MSE rate under assumptions that are competitive with the state of the art. The asymptotic distributions of these estimators, which are unknown for all other k-nn based distributional functional estimators, are also presented which enables us to perform hypothesis testing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct Ensemble Estimation of Density Functionals

Estimating density functionals of analog sources is an important problem in statistical signal processing and information theory. Traditionally, estimating these quantities requires either making parametric assumptions about the underlying distributions or using non-parametric density estimation followed by integration. In this paper we introduce a direct nonparametric approach which bypasses t...

متن کامل

Optimal rates for k-NN density and mode estimation

We present two related contributions of independent interest: (1) high-probability finite sample rates for k-NN density estimation, and (2) practical mode estimators – based on k-NN – which attain minimax-optimal rates under surprisingly general distributional conditions.

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

Exploring the neighbor graph to improve distributional thesauri (Explorer le graphe de voisinage pour améliorer les thésaurus distributionnels) [in French]

In this paper, we address the issue of building and improving a distributional thesaurus. We first show that existing tools from the information retrieval domain can be directly used in order to build a thesaurus with state-of-the-art performance. Secondly, we focus more specifically on improving the obtained thesaurus, seen as a graph of k-nearest neighbors. By exploiting information about the...

متن کامل

Improving distributional thesauri by exploring the graph of neighbors

In this paper, we address the issue of building and improving a distributional thesaurus. We first show that existing tools from the information retrieval domain can be directly used in order to build a thesaurus with state-of-the-art performance. Secondly, we focus more specifically on improving the obtained thesaurus, seen as a graph of k-nearest neighbors. By exploiting information about the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.03083  شماره 

صفحات  -

تاریخ انتشار 2017